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ABSTRACT 

Summary: Experimental MS n mass spectral libraries currently do not 
adequately cover chemical space. This limits the robust annotation of 
metabolites in metabolomics studies of complex biological samples. 
In silico fragmentation libraries would improve the identification of 
compounds from experimental multistage fragmentation data when 
experimental reference data are unavailable. Here, we present a 
freely available software package to automatically control Mass 
Frontier software to construct in silico mass spectral libraries and to 
perform spectral matching. Based on two case studies, we have 
demonstrated that high-throughput automation of Mass Frontier 
allows researchers to generate in silico mass spectral libraries in 
an automated and high-throughput fashion with little or no human 
intervention required. 

Availability and implementation: Documentation, examples, results 
and source code are available at http://www.biosciences-labs.bham. 
ac.uk/viant/hammer/. 
Contact: m.viant@bham.ac.uk 

Supplementary information: Supplementary data are available at 
Bioinformatics online. 
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1 INTRODUCTION 

Mass spectrometry (MS)-based metabolomics is a rapidly 
developing field that aims to detect and measure a variety of 
small biological molecules (metabolites) over a wide dynamic 
range (Dettmer et ah, 2007). Although thousands of metabolites 
are typically detected in an untargeted metabolomics study of a 
biological sample, their subsequent identification represents the 
most significant bottleneck in the discovery of new biochemical 
knowledge (Dunn et ai, 2013; Kind and Fiehn, 2010; Wishart, 
2011). In many cases, multiple empirical formulae and/or puta- 
tive chemical structures are reported for each observed mass 
feature (or more strictly mass-to-charge ratio, m/z). Multistage 
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(MS 11 ) mass spectrometry, which is an experimental technique 
to collect in-depth fragmentation data related to the chemical 
structure of metabolites, is often applied to increase accuracy 
and specificity in metabolite annotation and identification. 
However, experimental MS" mass spectral libraries currently 
do not adequately cover the search space for all metabolites pre- 
sent in complex biological samples, as authentic chemical stand- 
ards are not available for all metabolites. The generation of 
in silico MS" libraries is anticipated to greatly improve the suc- 
cess rate of annotation of metabolites detected in metabolomics 
studies, when experimentally acquired MS" data of authentic 
chemical standards are unavailable. Commercial software pack- 
ages such as Mass Frontier and ACD/MS Fragmenter predict 
in silico fragmentation patterns to assist the interpretation of 
experimental MS" data (Krauss et ai, 2010; Oppermann et ai, 
2012). Additionally, a freely available tool named MetFrag has 
been developed previously (Wolf et ah, 2010). This tool selects 
candidate structures from a compound library based on the 
molecular ion m/z and generates in silico substructures to subse- 
quently annotate fragment peaks. MetFrag does not provide the 
ability to create in silico patterns or libraries independent of 
the use of experimental data; however, it has been proven to 
be an efficient approach to annotate fragmentation mass spectra 
derived from MS experiments. 

Mass Frontier is a well-known software package that has been 
used by more than a thousand academic institutions and com- 
panies throughout the world for the management, evaluation 
and interpretation of mass spectra. It includes three resources 
that assist the user to robustly predict in silico fragmentation 
patterns: a set of general fragmentation rules, a fragmentation 
library comprising ~ 150 000 fragmentation mechanisms col- 
lected from the scientific literature and user-specified mechan- 
isms. However, Mass Frontier cannot readily perform batch 
processing on the scale required to generate comprehensive 
in silico libraries for thousands of metabolites. 

Here, we present a freely available software package, named 
HAMMER (High-throughput AutoMation of Mass frontiER), 
to automatically control Mass Frontier software to perform 
in silico fragmentation in an automated and high-throughput 
matter. This package allows the user to readily generate 



© The Author 2013. Published by Oxford University Press. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.Org/licenses/by/3.0/), which 
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 



J.Zhou et al. 



large-scale in silico fragmentation libraries and perform mass 
spectral matching to mass spectrometry-derived data. 



2 METHODS AND IMPLEMENTATION 

HAMMER is developed under the Python language and sup- 
ports Windows 7 and XP (both 32 bit and 64 bit). Its software 
requirements include Java 6/7, Open Babel (O'Boyle et al., 201 1), 
Sikuli (Yeh et al., 2009) and Mass Frontier 7. A desktop PC with 
2.66 GHz CPU and 4 GB memory was used as the operating 
platform. 

As shown in Figure 1, HAMMER consists of four modules: 

(1) RetrieveStructures retrieves chemical structures from a 
compound database of interest such as ChemSpider, 
Kyoto Encyclopedia of Genes and Genomes (KEGG) 
and PubChem. The retrieval is based on a submitted 
plain text file that includes a list of compounds or chem- 
icals (with specific names, database identifier or chemical 
formula). Additionally, metabolites within specific KEGG 
pathways can be downloaded based on a KEGG pathway 
identifier. Candidate entries found in the databases of 
interest are automatically downloaded, verified and con- 
verted into .mol format (required by Mass Frontier) or 
other formats such as SMILES, InChI or empirical for- 
mula. Open Babel is used to perform structure verification 
and conversion. A report is generated to summarize the 
information retrieved for each compound (e.g. number 
of candidates, database IDs and URL to the specific 
database entry). 

(2) InSilico Fragmentation automatically controls the oper- 
ation of Mass Frontier by using the open-source visual 
scripting software Sikuli. Sikuli allows control of software 
when no application programming interface is available, 
which is the case for many commercial software packages. 
It readily uses image recognition of graphical user interface 
(GUI) elements to operate software. HAMMER contains 
a Sikuli standalone runtime. As a result, the 
InSilicoFragmentation module works without any setup, 
otherwise simple configuration is required. Two versions 
of InSilicoFragmentation are provided: Windows 7 and 
Windows XP. We have separated the GUI images (or pat- 
terns) and search regions from the source code itself. This 
allows the user to modify the script more easily if required 
(e.g. applying a different version of Mass Frontier, see 
online manual for details). The structure file (.mol) of 
each compound is separately imported into Mass 
Frontier, and in silico fragmentation is performed applying 
user-defined fragmentation settings (see online manual for 
details). Structural and m/z information for each fragment 
is collected for each in silico reaction and if required 
(i.e. for MS n ) is used for the next fragmentation step. 
This use allows closed-loop operation of Mass Frontier 
to perform 'multistage' in silico fragmentation. In silico 
fragments, their corresponding unique m/z values, chem- 
ical structures (in each stage) and fragmentation mechan- 
isms are exported for further analysis and visualization. 
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Fig. 1. Workflow for HAMMER 



(3) OrganiseFragments uses the in silico fragmentation results 
as its input. The fragments are organized systematically in 
separate folders according to the compound of interest, 
and a 2D chemical structure image file in.png format is 
generated for each fragment. The in silico fragmentation 
results are exportable in several formats, such as in exten- 
sible mark-up language and plain text formats, as well as 
chemical mark-up language and National Institute of 
Standards and Technology spectra library files (NIST- 
MSP) for further analysis. These formats are compatible 
with common mass spectrometry databases and software 
packages. The hierarchical relationships of the fragments 
are parsed to an extensible mark-up language file that can 
easily be accessed using scripting languages. Additionally, 
a hierarchical tree visualization (as a .pdf) is generated to 
allow visual comparison or interpretation. 

(4) SpectralMatching computes a score, using the pMatch al- 
gorithm, to evaluate the probability of a match between an 
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experimental fragmentation spectrum and an in silico 
fragmentation pattern in a library (Ye et al, 2010). The 
pMatch algorithm is a spectral matching algorithm origin- 
ally developed for mass spectrometry-based protein iden- 
tification (see Supplementary Information for details of 
the algorithm). The SpectralMatching module uses MSP 
files (see previous module), experimental and in silico, as its 
input. For each experimental fragmentation spectrum, a 
report is generated, which includes detailed information 
on the matching and annotation. 



3 CASE STUDIES 

HAMMER was used to generate in silico fragmentation data for 
two distinct groups of compounds: (i) all 72 metabolites within 
the 'phenylalanine metabolism' KEGG pathway (i.e. map00360) 
and (ii) the top 200 most prescribed drugs in the USA in 2011 
(RxList, 2011). These were selected to demonstrate the applic- 
ability and capability of HAMMER across diverse compound 
sets. Some drugs are composed of multiple compounds, and each 
compound was separately parsed and imported into Mass 
Frontier. Additionally, lower mass neutral and charged mol- 
ecules/atoms (e.g. water and sodium) within these separated 
drug compounds were excluded from this demonstration dataset 
(see Supplementary Information for details). This resulted in 151 
unique structures for the drugs dataset. See Supplementary Table 
SI for specific settings used for both case studies. The run times 
for the phenylalanine and drug datasets were ~8.5 and 14.5 h, 
respectively. On average, ~3200 fragments were predicted for 
each compound (see Supplementary Tables S2-S4). Applying 
the defined Mass Frontier parameter settings, it is expected 
that MS" data are reported, where n>2. This relatively high 
number of fragments illustrates the complexity and diversity of 
the fragmentation mechanisms. Although, these in silico frag- 
ments are predicted based on a set of general fragmentation 
rules and a fragmentation library comprising ~ 150 000 fragmen- 
tation mechanisms collected from the scientific literature, the 
numbers of fragments produced are highly dependent on the 
complexity of the chemical structures and the parameter settings 
defined by the user (see Supplementary Information and online 
manual for details). In silico fragments may be reported, which 
are false positives; these fragments are not actually created in the 
MS fragmentation process, or are created but their low stabilities 
ensure that further and complete fragmentation or decompos- 
ition is observed before the ions are detected. 

To assess the applicability of each in silico fragmentation li- 
brary, five experimental fragmentation mass spectra were retrieved 
from MassBank to perform spectral matching [Supplementary 
Table S5, Case Study I: acetyl-CoA (KNA00207), capsaicin 
(WA001605), isobutyryl-CoA (PR 100 154), N-acetyl-L- 
phenylalanine (KO002200) and succinic acid (KZ000074). Case 
study II: amoxicillin (WA001751), digoxin (WA000563), meloxi- 
cam (WA002576), naproxen (WA000359) and prednisone 
(CO000368)] (Horai et al, 2010). All of the 10 real fragmentation 



spectra were correctly identified based on the data present in these 
small in silico mass spectral libraries (Supplementary Table S6). 

It is important to highlight that the parameter settings used 
in Mass Frontier to perform in silico fragmentation play an im- 
portant role in the correct prediction of in silico fragments and 
mechanisms. Additionally, understanding the experimental par- 
ameters used to collect experimental data (e.g. difference in 
collision energy or type of fragmentation) is also important to 
maximize metabolite annotation. However, these two challenges 
are outside the scope of the work presented here. 

Both case studies show that HAMMER allows researchers to 
readily generate in silico MS" libraries and perform batch spec- 
tral matching of in silico mass spectral data to MS-derived data 
in an automated high-throughput fashion with minimal or no 
human intervention. In addition, we have shown with both case 
studies the value and applicability of visual scripting in the field 
of computational biology. We anticipate this software package 
will be used across a wide range of disciplines including metabo- 
lomics, organic synthetic chemistry and pharmaceutical research. 
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